Distance Metrics Selection Validity in Cluster Analysis
نویسندگان
چکیده
منابع مشابه
Cluster validity analysis using subsampling
I n t r o d u c t i o n The word "clustering" (unsupervised classification) refers to methods of grouping objects based on some similarity measure between them. Clustering algorithms can be classified into four classes, namely Partitional, Hierarchical, Density-based and Grid-based [8]. Each of these classes has subclasses and different corresponding approaches, e.g., conceptual, fuzzy, selforg...
متن کاملValidity Measure of Cluster Based On the Intra-Cluster and Inter-Cluster Distance
The k-means method has been shown to be effective in producing good clustering results for many practical applications. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. The main disadvantage of the k-means algorithm is that the ...
متن کاملApproximate Greedy Clustering and Distance Selection for Graph Metrics
In this paper, we consider two important problems defined on finite metric spaces, and provide efficient new algorithms and approximation schemes for these problems on inputs given as graph shortest path metrics or high-dimensional Euclidean metrics. The first of these problems is the greedy permutation (or farthest-first traversal) of a finite metric space: a permutation of the points of the s...
متن کاملCluster Validity Through Graph-based Boundary Analysis
Gaining confidence that a clustering algorithm has produced meaningful results and not an accident of its usually heuristic optimization is central to data mining. This is the issue of cluster validity. We propose here a method by which proximity graphs are used to effectively detect border points and measure the margin between clusters. With analysis of boundary situation, we design a framewor...
متن کاملCluster Analysis Through Model Selection
Clustering is an important and challenging statistical problem for which there is an extensive literature. Modelling approaches include mixture models and product partition models. Here we develop a product partition model and search algorithm driven by Bayes factors from intrinsic priors. The priors we develop for the partitions, and the number of clusters in the partition, lead to finding par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scientific Journal of Riga Technical University. Computer Sciences
سال: 2011
ISSN: 1407-7493
DOI: 10.2478/v10143-011-0045-y